Dataset statistics
| Number of variables | 12 |
|---|---|
| Number of observations | 2226382 |
| Missing cells | 2640112 |
| Missing cells (%) | 9.9% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 203.8 MiB |
| Average record size in memory | 96.0 B |
Variable types
| Numeric | 8 |
|---|---|
| Categorical | 1 |
| Text | 3 |
bath is highly overall correlated with bed and 2 other fields | High correlation |
bed is highly overall correlated with bath and 1 other fields | High correlation |
house_size is highly overall correlated with bath and 2 other fields | High correlation |
price is highly overall correlated with bath and 1 other fields | High correlation |
bed has 481317 (21.6%) missing values | Missing |
bath has 511771 (23.0%) missing values | Missing |
acre_lot has 325589 (14.6%) missing values | Missing |
house_size has 568484 (25.5%) missing values | Missing |
prev_sold_date has 734297 (33.0%) missing values | Missing |
price is highly skewed (γ1 = 546.3030625) | Skewed |
bed is highly skewed (γ1 = 56.65481293) | Skewed |
bath is highly skewed (γ1 = 152.4149966) | Skewed |
acre_lot is highly skewed (γ1 = 106.2802845) | Skewed |
house_size is highly skewed (γ1 = 1286.9001) | Skewed |
Reproduction
| Analysis started | 2024-09-02 21:18:25.087308 |
|---|---|
| Analysis finished | 2024-09-02 21:19:24.387923 |
| Duration | 59.3 seconds |
| Software version | ydata-profiling vv4.9.0 |
| Download configuration | config.json |
brokered_by
Real number (ℝ)
| Distinct | 110143 |
|---|---|
| Distinct (%) | 5.0% |
| Missing | 4533 |
| Missing (%) | 0.2% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 52939.893 |
| Minimum | 0 |
|---|---|
| Maximum | 110142 |
| Zeros | 12 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 17.0 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 8485 |
| Q1 | 23861 |
| median | 52884 |
| Q3 | 79183 |
| 95-th percentile | 105405 |
| Maximum | 110142 |
| Range | 110142 |
| Interquartile range (IQR) | 55322 |
Descriptive statistics
| Standard deviation | 30642.753 |
|---|---|
| Coefficient of variation (CV) | 0.57882158 |
| Kurtosis | -1.1244379 |
| Mean | 52939.893 |
| Median Absolute Deviation (MAD) | 27089 |
| Skewness | 0.14739295 |
| Sum | 1.1762445 × 1011 |
| Variance | 9.389783 × 108 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 22611 | 45658 | 2.1% |
| 16829 | 27732 | 1.2% |
| 53016 | 21709 | 1.0% |
| 23592 | 9176 | 0.4% |
| 30807 | 8464 | 0.4% |
| 33714 | 6928 | 0.3% |
| 57595 | 6410 | 0.3% |
| 84534 | 5502 | 0.2% |
| 109978 | 5365 | 0.2% |
| 109914 | 5231 | 0.2% |
| Other values (110133) | 2079674 |
| Value | Count | Frequency (%) |
| 0 | 12 | < 0.1% |
| 1 | 5 | < 0.1% |
| 2 | 9 | < 0.1% |
| 3 | 2 | < 0.1% |
| 4 | 6 | < 0.1% |
| 5 | 4 | < 0.1% |
| 6 | 5 | < 0.1% |
| 7 | 2 | < 0.1% |
| 8 | 277 | |
| 9 | 4 | < 0.1% |
| Value | Count | Frequency (%) |
| 110142 | 8 | < 0.1% |
| 110141 | 6 | < 0.1% |
| 110140 | 3 | < 0.1% |
| 110139 | 1 | < 0.1% |
| 110138 | 62 | |
| 110137 | 1 | < 0.1% |
| 110136 | 1 | < 0.1% |
| 110135 | 23 | < 0.1% |
| 110134 | 1 | < 0.1% |
| 110133 | 15 | < 0.1% |
status
Categorical
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 17.0 MiB |
| for_sale | |
|---|---|
| sold | |
| ready_to_build | 25067 |
Length
| Max length | 14 |
|---|---|
| Median length | 8 |
| Mean length | 6.6086691 |
| Min length | 4 |
Characters and Unicode
| Total characters | 14713422 |
|---|---|
| Distinct characters | 14 |
| Distinct categories | 2 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | for_sale |
|---|---|
| 2nd row | for_sale |
| 3rd row | for_sale |
| 4th row | for_sale |
| 5th row | for_sale |
Common Values
| Value | Count | Frequency (%) |
| for_sale | 1389306 | |
| sold | 812009 | |
| ready_to_build | 25067 | 1.1% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| for_sale | 1389306 | |
| sold | 812009 | |
| ready_to_build | 25067 | 1.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| o | 2226382 | |
| l | 2226382 | |
| s | 2201315 | |
| _ | 1439440 | |
| r | 1414373 | |
| a | 1414373 | |
| e | 1414373 | |
| f | 1389306 | |
| d | 862143 | 5.9% |
| y | 25067 | 0.2% |
| Other values (4) | 100268 | 0.7% |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 13273982 | |
| Connector Punctuation | 1439440 | 9.8% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| o | 2226382 | |
| l | 2226382 | |
| s | 2201315 | |
| r | 1414373 | |
| a | 1414373 | |
| e | 1414373 | |
| f | 1389306 | |
| d | 862143 | 6.5% |
| y | 25067 | 0.2% |
| t | 25067 | 0.2% |
| Other values (3) | 75201 | 0.6% |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 1439440 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 13273982 | |
| Common | 1439440 | 9.8% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| o | 2226382 | |
| l | 2226382 | |
| s | 2201315 | |
| r | 1414373 | |
| a | 1414373 | |
| e | 1414373 | |
| f | 1389306 | |
| d | 862143 | 6.5% |
| y | 25067 | 0.2% |
| t | 25067 | 0.2% |
| Other values (3) | 75201 | 0.6% |
Common
| Value | Count | Frequency (%) |
| _ | 1439440 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 14713422 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| o | 2226382 | |
| l | 2226382 | |
| s | 2201315 | |
| _ | 1439440 | |
| r | 1414373 | |
| a | 1414373 | |
| e | 1414373 | |
| f | 1389306 | |
| d | 862143 | 5.9% |
| y | 25067 | 0.2% |
| Other values (4) | 100268 | 0.7% |
price
Real number (ℝ)
HIGH CORRELATION  SKEWED 
| Distinct | 102137 |
|---|---|
| Distinct (%) | 4.6% |
| Missing | 1541 |
| Missing (%) | 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 524195.52 |
| Minimum | 0 |
|---|---|
| Maximum | 2.1474836 × 109 |
| Zeros | 280 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 17.0 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 30000 |
| Q1 | 165000 |
| median | 325000 |
| Q3 | 550000 |
| 95-th percentile | 1495000 |
| Maximum | 2.1474836 × 109 |
| Range | 2.1474836 × 109 |
| Interquartile range (IQR) | 385000 |
Descriptive statistics
| Standard deviation | 2138893.2 |
|---|---|
| Coefficient of variation (CV) | 4.0803348 |
| Kurtosis | 492423.19 |
| Mean | 524195.52 |
| Median Absolute Deviation (MAD) | 180000 |
| Skewness | 546.30306 |
| Sum | 1.1662517 × 1012 |
| Variance | 4.5748642 × 1012 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 350000 | 15430 | 0.7% |
| 250000 | 15254 | 0.7% |
| 325000 | 14040 | 0.6% |
| 225000 | 13939 | 0.6% |
| 450000 | 13213 | 0.6% |
| 275000 | 13084 | 0.6% |
| 425000 | 12769 | 0.6% |
| 375000 | 12055 | 0.5% |
| 299900 | 11791 | 0.5% |
| 150000 | 11315 | 0.5% |
| Other values (102127) | 2091951 |
| Value | Count | Frequency (%) |
| 0 | 280 | |
| 1 | 508 | |
| 2 | 14 | < 0.1% |
| 3 | 3 | < 0.1% |
| 4 | 2 | < 0.1% |
| 5 | 5 | < 0.1% |
| 6 | 6 | < 0.1% |
| 7 | 2 | < 0.1% |
| 8 | 14 | < 0.1% |
| 9 | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 2147483600 | 1 | |
| 1000000000 | 1 | |
| 875000000 | 1 | |
| 515000000 | 1 | |
| 295000000 | 1 | |
| 281500000 | 1 | |
| 250000000 | 1 | |
| 212500000 | 1 | |
| 169000000 | 1 | |
| 165000000 | 1 |
bed
Real number (ℝ)
HIGH CORRELATION  MISSING  SKEWED 
| Distinct | 99 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 481317 |
| Missing (%) | 21.6% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.2758407 |
| Minimum | 1 |
|---|---|
| Maximum | 473 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 17.0 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 3 |
| median | 3 |
| Q3 | 4 |
| 95-th percentile | 5 |
| Maximum | 473 |
| Range | 472 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 1.5672739 |
|---|---|
| Coefficient of variation (CV) | 0.47843408 |
| Kurtosis | 12971.516 |
| Mean | 3.2758407 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 56.654813 |
| Sum | 5716555 |
| Variance | 2.4563473 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 3 | 753923 | |
| 4 | 440566 | |
| 2 | 311019 | |
| 5 | 120637 | 5.4% |
| 1 | 65098 | 2.9% |
| 6 | 32209 | 1.4% |
| 7 | 8001 | 0.4% |
| 8 | 6103 | 0.3% |
| 9 | 2402 | 0.1% |
| 10 | 1378 | 0.1% |
| Other values (89) | 3729 | 0.2% |
| (Missing) | 481317 |
| Value | Count | Frequency (%) |
| 1 | 65098 | 2.9% |
| 2 | 311019 | |
| 3 | 753923 | |
| 4 | 440566 | |
| 5 | 120637 | 5.4% |
| 6 | 32209 | 1.4% |
| 7 | 8001 | 0.4% |
| 8 | 6103 | 0.3% |
| 9 | 2402 | 0.1% |
| 10 | 1378 | 0.1% |
| Value | Count | Frequency (%) |
| 473 | 1 | |
| 444 | 2 | |
| 222 | 1 | |
| 212 | 1 | |
| 210 | 1 | |
| 190 | 1 | |
| 148 | 1 | |
| 142 | 1 | |
| 136 | 1 | |
| 123 | 1 |
bath
Real number (ℝ)
HIGH CORRELATION  MISSING  SKEWED 
| Distinct | 86 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 511771 |
| Missing (%) | 23.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.4964403 |
| Minimum | 1 |
|---|---|
| Maximum | 830 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 17.0 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 2 |
| median | 2 |
| Q3 | 3 |
| 95-th percentile | 4 |
| Maximum | 830 |
| Range | 829 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 1.6525725 |
|---|---|
| Coefficient of variation (CV) | 0.66197158 |
| Kurtosis | 65874.151 |
| Mean | 2.4964403 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 152.415 |
| Sum | 4280424 |
| Variance | 2.730996 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 2 | 746294 | |
| 3 | 471821 | |
| 1 | 260131 | 11.7% |
| 4 | 157290 | 7.1% |
| 5 | 45563 | 2.0% |
| 6 | 17080 | 0.8% |
| 7 | 7114 | 0.3% |
| 8 | 4078 | 0.2% |
| 9 | 1902 | 0.1% |
| 10 | 1038 | < 0.1% |
| Other values (76) | 2300 | 0.1% |
| (Missing) | 511771 |
| Value | Count | Frequency (%) |
| 1 | 260131 | 11.7% |
| 2 | 746294 | |
| 3 | 471821 | |
| 4 | 157290 | 7.1% |
| 5 | 45563 | 2.0% |
| 6 | 17080 | 0.8% |
| 7 | 7114 | 0.3% |
| 8 | 4078 | 0.2% |
| 9 | 1902 | 0.1% |
| 10 | 1038 | < 0.1% |
| Value | Count | Frequency (%) |
| 830 | 1 | |
| 752 | 1 | |
| 460 | 1 | |
| 222 | 2 | |
| 212 | 2 | |
| 198 | 1 | |
| 175 | 1 | |
| 163 | 1 | |
| 157 | 1 | |
| 123 | 1 |
acre_lot
Real number (ℝ)
MISSING  SKEWED 
| Distinct | 16057 |
|---|---|
| Distinct (%) | 0.8% |
| Missing | 325589 |
| Missing (%) | 14.6% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 15.223027 |
| Minimum | 0 |
|---|---|
| Maximum | 100000 |
| Zeros | 2226 |
| Zeros (%) | 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 17.0 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.05 |
| Q1 | 0.15 |
| median | 0.26 |
| Q3 | 0.98 |
| 95-th percentile | 14.02 |
| Maximum | 100000 |
| Range | 100000 |
| Interquartile range (IQR) | 0.83 |
Descriptive statistics
| Standard deviation | 762.8238 |
|---|---|
| Coefficient of variation (CV) | 50.109862 |
| Kurtosis | 12542.323 |
| Mean | 15.223027 |
| Median Absolute Deviation (MAD) | 0.16 |
| Skewness | 106.28028 |
| Sum | 28935824 |
| Variance | 581900.15 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0.17 | 66180 | 3.0% |
| 0.14 | 65258 | 2.9% |
| 0.16 | 55864 | 2.5% |
| 0.23 | 55742 | 2.5% |
| 0.15 | 52191 | 2.3% |
| 0.18 | 49004 | 2.2% |
| 0.11 | 46560 | 2.1% |
| 0.19 | 44391 | 2.0% |
| 0.13 | 42480 | 1.9% |
| 0.2 | 41376 | 1.9% |
| Other values (16047) | 1381747 | |
| (Missing) | 325589 | 14.6% |
| Value | Count | Frequency (%) |
| 0 | 2226 | 0.1% |
| 0.01 | 8886 | 0.4% |
| 0.02 | 20582 | |
| 0.03 | 24737 | |
| 0.04 | 24488 | |
| 0.05 | 23190 | |
| 0.06 | 23920 | |
| 0.07 | 24531 | |
| 0.08 | 21848 | |
| 0.09 | 27516 |
| Value | Count | Frequency (%) |
| 100000 | 52 | |
| 99999 | 7 | < 0.1% |
| 98135 | 1 | < 0.1% |
| 96120 | 1 | < 0.1% |
| 95832 | 1 | < 0.1% |
| 94457 | 1 | < 0.1% |
| 93248 | 1 | < 0.1% |
| 91178 | 1 | < 0.1% |
| 91040 | 1 | < 0.1% |
| 90522 | 1 | < 0.1% |
street
Real number (ℝ)
| Distinct | 2001358 |
|---|---|
| Distinct (%) | 90.3% |
| Missing | 10866 |
| Missing (%) | 0.5% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1012324.9 |
| Minimum | 0 |
|---|---|
| Maximum | 2001357 |
| Zeros | 2 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 17.0 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 101044.75 |
| Q1 | 506312.75 |
| median | 1012765.5 |
| Q3 | 1521173.2 |
| 95-th percentile | 1913044.2 |
| Maximum | 2001357 |
| Range | 2001357 |
| Interquartile range (IQR) | 1014860.5 |
Descriptive statistics
| Standard deviation | 583763.48 |
|---|---|
| Coefficient of variation (CV) | 0.57665624 |
| Kurtosis | -1.2130069 |
| Mean | 1012324.9 |
| Median Absolute Deviation (MAD) | 507433 |
| Skewness | -0.0092302458 |
| Sum | 2.2428221 × 1012 |
| Variance | 3.407798 × 1011 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1916862 | 158 | < 0.1% |
| 1861860 | 142 | < 0.1% |
| 1951128 | 127 | < 0.1% |
| 1801524 | 98 | < 0.1% |
| 793078 | 87 | < 0.1% |
| 824804 | 81 | < 0.1% |
| 222498 | 77 | < 0.1% |
| 6365 | 76 | < 0.1% |
| 274432 | 76 | < 0.1% |
| 1862077 | 75 | < 0.1% |
| Other values (2001348) | 2214519 | |
| (Missing) | 10866 | 0.5% |
| Value | Count | Frequency (%) |
| 0 | 2 | |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 2 | |
| 4 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 9 | 1 |
| Value | Count | Frequency (%) |
| 2001357 | 1 | < 0.1% |
| 2001356 | 1 | < 0.1% |
| 2001355 | 1 | < 0.1% |
| 2001354 | 1 | < 0.1% |
| 2001353 | 1 | < 0.1% |
| 2001352 | 1 | < 0.1% |
| 2001351 | 1 | < 0.1% |
| 2001350 | 1 | < 0.1% |
| 2001349 | 1 | < 0.1% |
| 2001348 | 3 |
city
Text
| Distinct | 20098 |
|---|---|
| Distinct (%) | 0.9% |
| Missing | 1407 |
| Missing (%) | 0.1% |
| Memory size | 17.0 MiB |
Length
| Max length | 49 |
|---|---|
| Median length | 44 |
| Mean length | 9.0651117 |
| Min length | 1 |
Characters and Unicode
| Total characters | 20169647 |
|---|---|
| Distinct characters | 69 |
| Distinct categories | 6 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 2 ? |
Unique
| Unique | 2947 ? |
|---|---|
| Unique (%) | 0.1% |
Sample
| 1st row | Adjuntas |
|---|---|
| 2nd row | Adjuntas |
| 3rd row | Juana Diaz |
| 4th row | Ponce |
| 5th row | Mayaguez |
| Value | Count | Frequency (%) |
| city | 70521 | 2.4% |
| beach | 41006 | 1.4% |
| new | 35840 | 1.2% |
| san | 35513 | 1.2% |
| saint | 29933 | 1.0% |
| lake | 28214 | 1.0% |
| houston | 23930 | 0.8% |
| springs | 21440 | 0.7% |
| york | 20171 | 0.7% |
| fort | 19712 | 0.7% |
| Other values (14952) | 2617043 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 1862865 | 9.2% |
| a | 1782114 | 8.8% |
| o | 1505184 | 7.5% |
| n | 1470469 | 7.3% |
| l | 1348266 | 6.7% |
| i | 1238942 | 6.1% |
| r | 1234849 | 6.1% |
| t | 1049447 | 5.2% |
| s | 873953 | 4.3% |
| 718355 | 3.6% | |
| Other values (59) | 7085203 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 16497973 | |
| Uppercase Letter | 2950086 | 14.6% |
| Space Separator | 718355 | 3.6% |
| Other Punctuation | 2327 | < 0.1% |
| Decimal Number | 890 | < 0.1% |
| Dash Punctuation | 16 | < 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 1862865 | |
| a | 1782114 | |
| o | 1505184 | |
| n | 1470469 | |
| l | 1348266 | 8.2% |
| i | 1238942 | 7.5% |
| r | 1234849 | 7.5% |
| t | 1049447 | 6.4% |
| s | 873953 | 5.3% |
| d | 486422 | 2.9% |
| Other values (18) | 3645462 |
Uppercase Letter
| Value | Count | Frequency (%) |
| C | 337267 | 11.4% |
| S | 291749 | 9.9% |
| B | 234005 | 7.9% |
| P | 217612 | 7.4% |
| M | 212579 | 7.2% |
| L | 186356 | 6.3% |
| H | 169810 | 5.8% |
| A | 142442 | 4.8% |
| W | 138386 | 4.7% |
| R | 124476 | 4.2% |
| Other values (16) | 895404 |
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 157 | |
| 4 | 110 | |
| 2 | 97 | |
| 9 | 95 | |
| 3 | 95 | |
| 6 | 71 | |
| 5 | 70 | |
| 8 | 70 | |
| 0 | 64 | |
| 7 | 61 | 6.9% |
Other Punctuation
| Value | Count | Frequency (%) |
| ' | 2003 | |
| . | 312 | 13.4% |
| , | 12 | 0.5% |
Space Separator
| Value | Count | Frequency (%) |
| 718355 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 16 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 19448059 | |
| Common | 721588 | 3.6% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 1862865 | 9.6% |
| a | 1782114 | 9.2% |
| o | 1505184 | 7.7% |
| n | 1470469 | 7.6% |
| l | 1348266 | 6.9% |
| i | 1238942 | 6.4% |
| r | 1234849 | 6.3% |
| t | 1049447 | 5.4% |
| s | 873953 | 4.5% |
| d | 486422 | 2.5% |
| Other values (44) | 6595548 |
Common
| Value | Count | Frequency (%) |
| 718355 | ||
| ' | 2003 | 0.3% |
| . | 312 | < 0.1% |
| 1 | 157 | < 0.1% |
| 4 | 110 | < 0.1% |
| 2 | 97 | < 0.1% |
| 9 | 95 | < 0.1% |
| 3 | 95 | < 0.1% |
| 6 | 71 | < 0.1% |
| 5 | 70 | < 0.1% |
| Other values (5) | 223 | < 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 20169643 | |
| None | 4 | < 0.1% |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 1862865 | 9.2% |
| a | 1782114 | 8.8% |
| o | 1505184 | 7.5% |
| n | 1470469 | 7.3% |
| l | 1348266 | 6.7% |
| i | 1238942 | 6.1% |
| r | 1234849 | 6.1% |
| t | 1049447 | 5.2% |
| s | 873953 | 4.3% |
| 718355 | 3.6% | |
| Other values (57) | 7085199 |
None
| Value | Count | Frequency (%) |
| ó | 3 | |
| Ã | 1 | 25.0% |
state
Text
| Distinct | 55 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 8 |
| Missing (%) | < 0.1% |
| Memory size | 17.0 MiB |
Length
| Max length | 20 |
|---|---|
| Median length | 13 |
| Mean length | 8.3506293 |
| Min length | 4 |
Characters and Unicode
| Total characters | 18591624 |
|---|---|
| Distinct characters | 47 |
| Distinct categories | 3 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 1 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | Puerto Rico |
|---|---|
| 2nd row | Puerto Rico |
| 3rd row | Puerto Rico |
| 4th row | Puerto Rico |
| 5th row | Puerto Rico |
| Value | Count | Frequency (%) |
| florida | 249432 | 9.7% |
| california | 227215 | 8.8% |
| texas | 208335 | 8.1% |
| new | 176075 | 6.8% |
| carolina | 128112 | 5.0% |
| york | 103159 | 4.0% |
| north | 90013 | 3.5% |
| illinois | 85280 | 3.3% |
| virginia | 81072 | 3.1% |
| georgia | 80977 | 3.1% |
| Other values (51) | 1147586 |
Most occurring characters
| Value | Count | Frequency (%) |
| a | 2446490 | |
| i | 2125529 | 11.4% |
| o | 1656872 | 8.9% |
| n | 1526221 | 8.2% |
| r | 1291528 | 6.9% |
| s | 1143277 | 6.1% |
| e | 1052159 | 5.7% |
| l | 1030116 | 5.5% |
| t | 438216 | 2.4% |
| h | 419464 | 2.3% |
| Other values (37) | 5461752 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 15670111 | |
| Uppercase Letter | 2570631 | 13.8% |
| Space Separator | 350882 | 1.9% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 2446490 | |
| i | 2125529 | |
| o | 1656872 | |
| n | 1526221 | |
| r | 1291528 | |
| s | 1143277 | |
| e | 1052159 | |
| l | 1030116 | |
| t | 438216 | 2.8% |
| h | 419464 | 2.7% |
| Other values (14) | 2540239 |
Uppercase Letter
| Value | Count | Frequency (%) |
| C | 408253 | |
| N | 287064 | |
| M | 267532 | |
| F | 249432 | |
| T | 249299 | |
| I | 152965 | 6.0% |
| A | 132504 | 5.2% |
| O | 128510 | 5.0% |
| W | 121199 | 4.7% |
| Y | 103159 | 4.0% |
| Other values (12) | 470714 |
Space Separator
| Value | Count | Frequency (%) |
| 350882 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 18240742 | |
| Common | 350882 | 1.9% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| a | 2446490 | |
| i | 2125529 | |
| o | 1656872 | 9.1% |
| n | 1526221 | 8.4% |
| r | 1291528 | 7.1% |
| s | 1143277 | 6.3% |
| e | 1052159 | 5.8% |
| l | 1030116 | 5.6% |
| t | 438216 | 2.4% |
| h | 419464 | 2.3% |
| Other values (36) | 5110870 |
Common
| Value | Count | Frequency (%) |
| 350882 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 18591624 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| a | 2446490 | |
| i | 2125529 | 11.4% |
| o | 1656872 | 8.9% |
| n | 1526221 | 8.2% |
| r | 1291528 | 6.9% |
| s | 1143277 | 6.1% |
| e | 1052159 | 5.7% |
| l | 1030116 | 5.5% |
| t | 438216 | 2.4% |
| h | 419464 | 2.3% |
| Other values (37) | 5461752 |
zip_code
Real number (ℝ)
| Distinct | 30334 |
|---|---|
| Distinct (%) | 1.4% |
| Missing | 299 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 52186.676 |
| Minimum | 0 |
|---|---|
| Maximum | 99999 |
| Zeros | 1 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 17.0 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 8512 |
| Q1 | 29617 |
| median | 48382 |
| Q3 | 78070 |
| 95-th percentile | 95969 |
| Maximum | 99999 |
| Range | 99999 |
| Interquartile range (IQR) | 48453 |
Descriptive statistics
| Standard deviation | 28954.085 |
|---|---|
| Coefficient of variation (CV) | 0.55481756 |
| Kurtosis | -1.3131808 |
| Mean | 52186.676 |
| Median Absolute Deviation (MAD) | 25950 |
| Skewness | 0.092234425 |
| Sum | 1.1617187 × 1011 |
| Variance | 8.3833901 × 108 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 33993 | 2472 | 0.1% |
| 33981 | 2282 | 0.1% |
| 33974 | 1996 | 0.1% |
| 33160 | 1718 | 0.1% |
| 32909 | 1707 | 0.1% |
| 33139 | 1589 | 0.1% |
| 34288 | 1424 | 0.1% |
| 73099 | 1413 | 0.1% |
| 33953 | 1380 | 0.1% |
| 32908 | 1377 | 0.1% |
| Other values (30324) | 2208725 |
| Value | Count | Frequency (%) |
| 0 | 1 | < 0.1% |
| 601 | 2 | < 0.1% |
| 602 | 42 | |
| 603 | 48 | |
| 604 | 1 | < 0.1% |
| 605 | 2 | < 0.1% |
| 606 | 5 | < 0.1% |
| 610 | 9 | < 0.1% |
| 612 | 65 | |
| 613 | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 99999 | 37 | |
| 99950 | 2 | < 0.1% |
| 99929 | 18 | |
| 99927 | 2 | < 0.1% |
| 99925 | 6 | < 0.1% |
| 99923 | 3 | < 0.1% |
| 99921 | 7 | < 0.1% |
| 99919 | 13 | < 0.1% |
| 99918 | 21 | |
| 99903 | 8 | < 0.1% |
house_size
Real number (ℝ)
HIGH CORRELATION  MISSING  SKEWED 
| Distinct | 12061 |
|---|---|
| Distinct (%) | 0.7% |
| Missing | 568484 |
| Missing (%) | 25.5% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2714.4713 |
| Minimum | 4 |
|---|---|
| Maximum | 1.0404004 × 109 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 17.0 MiB |
Quantile statistics
| Minimum | 4 |
|---|---|
| 5-th percentile | 847 |
| Q1 | 1300 |
| median | 1760 |
| Q3 | 2413 |
| 95-th percentile | 4008 |
| Maximum | 1.0404004 × 109 |
| Range | 1.0404004 × 109 |
| Interquartile range (IQR) | 1113 |
Descriptive statistics
| Standard deviation | 808163.52 |
|---|---|
| Coefficient of variation (CV) | 297.72409 |
| Kurtosis | 1656699.7 |
| Mean | 2714.4713 |
| Median Absolute Deviation (MAD) | 528 |
| Skewness | 1286.9001 |
| Sum | 4.5003166 × 109 |
| Variance | 6.5312827 × 1011 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1200 | 8938 | 0.4% |
| 1500 | 6316 | 0.3% |
| 1800 | 6272 | 0.3% |
| 1400 | 6108 | 0.3% |
| 1600 | 5630 | 0.3% |
| 1440 | 5590 | 0.3% |
| 1000 | 5567 | 0.3% |
| 960 | 5394 | 0.2% |
| 1344 | 5383 | 0.2% |
| 1100 | 5096 | 0.2% |
| Other values (12051) | 1597604 | |
| (Missing) | 568484 | 25.5% |
| Value | Count | Frequency (%) |
| 4 | 1 | < 0.1% |
| 100 | 22 | |
| 101 | 2 | < 0.1% |
| 102 | 1 | < 0.1% |
| 104 | 1 | < 0.1% |
| 108 | 1 | < 0.1% |
| 110 | 1 | < 0.1% |
| 111 | 3 | < 0.1% |
| 112 | 1 | < 0.1% |
| 115 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 1040400400 | 1 | |
| 12992200 | 1 | |
| 9842382 | 1 | |
| 7971480 | 1 | |
| 3484800 | 1 | |
| 3434706 | 1 | |
| 1560780 | 1 | |
| 1454468 | 1 | |
| 1450112 | 1 | |
| 1306800 | 1 |
prev_sold_date
Text
MISSING 
| Distinct | 14954 |
|---|---|
| Distinct (%) | 1.0% |
| Missing | 734297 |
| Missing (%) | 33.0% |
| Memory size | 17.0 MiB |
Length
| Max length | 10 |
|---|---|
| Median length | 10 |
| Mean length | 10 |
| Min length | 10 |
Characters and Unicode
| Total characters | 14920850 |
|---|---|
| Distinct characters | 11 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 2386 ? |
|---|---|
| Unique (%) | 0.2% |
Sample
| 1st row | 2020-02-28 |
|---|---|
| 2nd row | 2019-06-28 |
| 3rd row | 2021-09-15 |
| 4th row | 2021-03-15 |
| 5th row | 2013-10-11 |
| Value | Count | Frequency (%) |
| 2022-03-31 | 17171 | 1.2% |
| 2022-04-15 | 16297 | 1.1% |
| 2022-04-22 | 15762 | 1.1% |
| 2022-04-08 | 15038 | 1.0% |
| 2022-02-28 | 14144 | 0.9% |
| 2022-04-29 | 13783 | 0.9% |
| 2021-11-30 | 12856 | 0.9% |
| 2022-03-25 | 12558 | 0.8% |
| 2022-02-25 | 12278 | 0.8% |
| 2021-11-19 | 12076 | 0.8% |
| Other values (14944) | 1350122 |
Most occurring characters
| Value | Count | Frequency (%) |
| 2 | 4000417 | |
| 0 | 3356517 | |
| - | 2984170 | |
| 1 | 2111038 | |
| 3 | 492966 | 3.3% |
| 4 | 454577 | 3.0% |
| 9 | 400090 | 2.7% |
| 8 | 331671 | 2.2% |
| 5 | 292909 | 2.0% |
| 7 | 263022 | 1.8% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 11936680 | |
| Dash Punctuation | 2984170 | 20.0% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 2 | 4000417 | |
| 0 | 3356517 | |
| 1 | 2111038 | |
| 3 | 492966 | 4.1% |
| 4 | 454577 | 3.8% |
| 9 | 400090 | 3.4% |
| 8 | 331671 | 2.8% |
| 5 | 292909 | 2.5% |
| 7 | 263022 | 2.2% |
| 6 | 233473 | 2.0% |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 2984170 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 14920850 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 2 | 4000417 | |
| 0 | 3356517 | |
| - | 2984170 | |
| 1 | 2111038 | |
| 3 | 492966 | 3.3% |
| 4 | 454577 | 3.0% |
| 9 | 400090 | 2.7% |
| 8 | 331671 | 2.2% |
| 5 | 292909 | 2.0% |
| 7 | 263022 | 1.8% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 14920850 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 2 | 4000417 | |
| 0 | 3356517 | |
| - | 2984170 | |
| 1 | 2111038 | |
| 3 | 492966 | 3.3% |
| 4 | 454577 | 3.0% |
| 9 | 400090 | 2.7% |
| 8 | 331671 | 2.2% |
| 5 | 292909 | 2.0% |
| 7 | 263022 | 1.8% |
| acre_lot | bath | bed | brokered_by | house_size | price | status | street | zip_code | |
|---|---|---|---|---|---|---|---|---|---|
| acre_lot | 1.000 | 0.108 | 0.149 | -0.004 | 0.252 | -0.070 | 0.003 | 0.159 | -0.072 |
| bath | 0.108 | 1.000 | 0.598 | -0.001 | 0.759 | 0.542 | 0.001 | -0.001 | 0.008 |
| bed | 0.149 | 0.598 | 1.000 | 0.007 | 0.715 | 0.350 | 0.003 | -0.000 | 0.004 |
| brokered_by | -0.004 | -0.001 | 0.007 | 1.000 | -0.001 | 0.001 | 0.081 | 0.000 | 0.065 |
| house_size | 0.252 | 0.759 | 0.715 | -0.001 | 1.000 | 0.535 | 0.000 | -0.003 | 0.011 |
| price | -0.070 | 0.542 | 0.350 | 0.001 | 0.535 | 1.000 | 0.000 | -0.126 | 0.128 |
| status | 0.003 | 0.001 | 0.003 | 0.081 | 0.000 | 0.000 | 1.000 | 0.119 | 0.123 |
| street | 0.159 | -0.001 | -0.000 | 0.000 | -0.003 | -0.126 | 0.119 | 1.000 | 0.002 |
| zip_code | -0.072 | 0.008 | 0.004 | 0.065 | 0.011 | 0.128 | 0.123 | 0.002 | 1.000 |
| brokered_by | status | price | bed | bath | acre_lot | street | city | state | zip_code | house_size | prev_sold_date | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 103378.0 | for_sale | 105000.0 | 3.0 | 2.0 | 0.12 | 1962661.0 | Adjuntas | Puerto Rico | 601.0 | 920.0 | NaN |
| 1 | 52707.0 | for_sale | 80000.0 | 4.0 | 2.0 | 0.08 | 1902874.0 | Adjuntas | Puerto Rico | 601.0 | 1527.0 | NaN |
| 2 | 103379.0 | for_sale | 67000.0 | 2.0 | 1.0 | 0.15 | 1404990.0 | Juana Diaz | Puerto Rico | 795.0 | 748.0 | NaN |
| 3 | 31239.0 | for_sale | 145000.0 | 4.0 | 2.0 | 0.10 | 1947675.0 | Ponce | Puerto Rico | 731.0 | 1800.0 | NaN |
| 4 | 34632.0 | for_sale | 65000.0 | 6.0 | 2.0 | 0.05 | 331151.0 | Mayaguez | Puerto Rico | 680.0 | NaN | NaN |
| 5 | 103378.0 | for_sale | 179000.0 | 4.0 | 3.0 | 0.46 | 1850806.0 | San Sebastian | Puerto Rico | 612.0 | 2520.0 | NaN |
| 6 | 1205.0 | for_sale | 50000.0 | 3.0 | 1.0 | 0.20 | 1298094.0 | Ciales | Puerto Rico | 639.0 | 2040.0 | NaN |
| 7 | 50739.0 | for_sale | 71600.0 | 3.0 | 2.0 | 0.08 | 1048466.0 | Ponce | Puerto Rico | 731.0 | 1050.0 | NaN |
| 8 | 81909.0 | for_sale | 100000.0 | 2.0 | 1.0 | 0.09 | 734904.0 | Ponce | Puerto Rico | 730.0 | 1092.0 | NaN |
| 9 | 65672.0 | for_sale | 300000.0 | 5.0 | 3.0 | 7.46 | 1946226.0 | Las Marias | Puerto Rico | 670.0 | 5403.0 | NaN |
| brokered_by | status | price | bed | bath | acre_lot | street | city | state | zip_code | house_size | prev_sold_date | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2226372 | 108243.0 | sold | 425000.0 | 3.0 | 3.0 | 0.06 | 970797.0 | Richland | Washington | 99354.0 | 1876.0 | 2022-02-14 |
| 2226373 | 16235.0 | sold | 305000.0 | 4.0 | 2.0 | 0.42 | 353937.0 | Richland | Washington | 99354.0 | 2000.0 | 2022-02-11 |
| 2226374 | 53860.0 | sold | 310000.0 | 3.0 | 1.0 | 0.21 | 500240.0 | Richland | Washington | 99354.0 | 1152.0 | 2022-02-11 |
| 2226375 | 60631.0 | sold | 385000.0 | 4.0 | 2.0 | 0.21 | 210890.0 | Richland | Washington | 99354.0 | 1656.0 | 2022-03-28 |
| 2226376 | 85499.0 | sold | 339900.0 | 4.0 | 2.0 | 0.20 | 41160.0 | Richland | Washington | 99354.0 | 2780.0 | 2022-03-28 |
| 2226377 | 23009.0 | sold | 359900.0 | 4.0 | 2.0 | 0.33 | 353094.0 | Richland | Washington | 99354.0 | 3600.0 | 2022-03-25 |
| 2226378 | 18208.0 | sold | 350000.0 | 3.0 | 2.0 | 0.10 | 1062149.0 | Richland | Washington | 99354.0 | 1616.0 | 2022-03-25 |
| 2226379 | 76856.0 | sold | 440000.0 | 6.0 | 3.0 | 0.50 | 405677.0 | Richland | Washington | 99354.0 | 3200.0 | 2022-03-24 |
| 2226380 | 53618.0 | sold | 179900.0 | 2.0 | 1.0 | 0.09 | 761379.0 | Richland | Washington | 99354.0 | 933.0 | 2022-03-24 |
| 2226381 | 108243.0 | sold | 580000.0 | 5.0 | 3.0 | 0.31 | 307704.0 | Richland | Washington | 99354.0 | 3615.0 | 2022-03-23 |